An Evaluation of Linguistically-motivated Indexing Schemes
نویسندگان
چکیده
In this article, we describe a number of indexing experiments based on indexing terms other than simple keywords. These experiments were conducted as one step in validating a linguistically-motivated indexing model. The problem is important but not new. What is new in this approach is the variety of schemes evaluated. It is important since it should not only help to overcome the well-known problems of bag-of-words representations, but also the difficulties raised by non-linguistic text simplification techniques such as stemming, stop-word deletion, and term selection. Our approach in the selection of terms is based on part-of-speech tagging and shallow parsing. The indexing schemes evaluated vary from simple keywords to nouns, verbs, adverbs, adjectives, adjacent word-pairs, and head-modifier pairs. Our findings apply to Information Retrieval and most of related areas.
منابع مشابه
Linguistically Motivated Descriptive Term Selection
A linguistically motivated approach to indexing, that is the provision of descriptive terms for texts of any kind, is presented and illustrated. The approach is designed to achieve good, i.e. accurate and flexible, indexing by identifying index term sources in the meaning representations built by a powerful general purpose analyser, and providing a range of text expressions constituting semanti...
متن کاملMUMIS – A Multimedia Indexing and Searching Environment
We describe in this paper the MUMIS Project (Multimedia Indexing and Searching Environment)1 and show the role linguistically motivated annotations, coupled with domain-specific information, can play for the indexing and the searching of multimedia (and multilingual) data. MUMIS develops and integrates base technologies, demonstrated within a laboratory prototype, to support automated multimedi...
متن کاملWhat is the role of NLP in text retrieval?
This paper addresses the value of linguistically-motivated indexing (LMI) for document and text retrieval. After reviewing the basic concepts involved and the assumptions on which LMI is based, namely that complex index descriptions and terms are necessary, I consider past and recent research on LMI, and specifically on automated LMI via NLP. Experiments in the first phase of research, to the l...
متن کاملLinguistically Motivated Features for Enhanced Back-of-the-Book Indexing
In this paper we present a supervised method for back-of-the-book index construction. We introduce a novel set of features that goes beyond the typical frequency-based analysis, including features based on discourse comprehension, syntactic patterns, and information drawn from an online encyclopedia. In experiments carried out on a book collection, the method was found to lead to an improvement...
متن کامل